Uncovering hidden duplicated content in public transcriptomics data

نویسندگان

  • Marta Rosikiewicz
  • Aurélie Comte
  • Anne Niknejad
  • Marc Robinson-Rechavi
  • Frederic B. Bastian
چکیده

As part of the development of the database Bgee (a dataBase for Gene Expression Evolution), we annotate and analyse expression data from different types and different sources, notably Affymetrix data from GEO and ArrayExpress, and RNA-Seq data from SRA. During our quality control procedure, we have identified duplicated content in GEO and ArrayExpress, affecting ∼14% of our data: fully or partially duplicated experiments from independent data submissions, Affymetrix chips reused in several experiments, or reused within an experiment. We present here the procedure that we have established to filter such duplicates from Affymetrix data, and our procedure to identify future potential duplicates in RNA-Seq data.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Detecting the undetectable: uncovering duplicated segments in Arabidopsis by comparison with rice.

Genome analysis shows that large-scale gene duplications have occurred in fungi, animals and plants, creating genomic regions that show similarity in gene content and order. However, the high frequency of gene loss reduces colinearity resulting in duplicated regions that, in the extreme, no longer share homologous genes. Here, we show that by comparison with an appropriate second genome, such p...

متن کامل

Hidden threats inducing medical errors in Tehran public hospitals

Introduction: Medical errors represent a serious public health problem and pose a threat to patient safety. This study aimed to determine the hidden threats in the incidence of medical errors at public hospitals in Tehran. Methods: In this descriptive study, the population included all process owners (12-person teams) in public hospitals of Tehran. The sample size was 396 individuals, sel...

متن کامل

Longest-path Algorithm to Solve Uncovering Problem of Hidden Markov Model

Uncovering problem is one of three main problems of hidden Markov model (HMM), which aims to find out optimal state sequence that is most likely to produce a given observation sequence. Although Viterbi is the best algorithm to solve uncovering problem, I introduce a new viewpoint of how to solve HMM uncovering problem. The proposed algorithm is called longest-path algorithm in which the uncove...

متن کامل

Uncovering Hidden Mathematics of the Multiplication Table Using Spreadsheets

This paper reveals a number of learning activities emerging from a spreadsheetgenerated multiplication table. These activities are made possible by using such features of the software as conditional formatting, circular referencing, calculation through iteration, scroll bars, and graphing. The paper is a reflection on a mathematics content course designed for prospective elementary teachers usi...

متن کامل

Dynamics of embryonic stem cell differentiation inferred from single-cell transcriptomics show a series of transitions through discrete cell states

The complexity of gene regulatory networks that lead multipotent cells to acquire different cell fates makes a quantitative understanding of differentiation challenging. Using a statistical framework to analyze single-cell transcriptomics data, we infer the gene expression dynamics of early mouse embryonic stem (mES) cell differentiation, uncovering discrete transitions across nine cell states....

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 2013  شماره 

صفحات  -

تاریخ انتشار 2013